Maven Communications Case Study¶

Introduction¶

Maven Communications is a California-based Telecommunications company, we would be assuming the role of a Business Intelligence(BI) Consultant.

Objective - The objective of the case study is to help the company - Maven Communications to improve retention by identifying high value customers and churn risks, and have been asked to present your findings to the CMO in the form of a single page report or dashboard.

Methods - We will be using Python for this case study and Pandas package will helps us in manipulating the data as per our requirement, with this we will also be using the Holoviz, which is a collection of multiple different high level python plotting libraries.

We will be performing data analysis step-by-step starting with the:

STAGE-A
1. Prepare Phase - We will collect the data and make it appropriate for the futher stages.
2. Process Phase - A deep level of data cleaning and wrangling.

STAGE-B
3. Analysis Phase - In this we will be exploring the dataset and understanding the better way to solve some of the questions.
4. Share Phase - Making our analysis capable enough for any one to understand.
5. Act Phase - Suggestion and implementation on the basis of the analysis done in the above phase.

STAGE-A¶

Loading and Wrangling Data¶

In [1]:
#Loading all the packages needed for this analysis

#pandas for data wrangling/preparation
import pandas as pd 

#panel for dashboard
import panel as pn 
pn.extension(sizing_mode = 'stretch_width')

#collection of different tools
import holoviews as hv 
hv.extension('bokeh')

#Plotting api
import hvplot.pandas 

#geoviews for geo plot
import geoviews as gv
from geoviews import opts, tile_sources as gvts
import cartopy.crs as ccrs

#to ignore warnings
import warnings 
warnings.filterwarnings("ignore")

#operating system dependent functionality
import os
In [2]:
#looking for current working directory
directory = os.getcwd()
directory
Out[2]:
'/home/john/Downloads/Maven-Telecom-Analysis-main'
In [3]:
#Reading the customer data
tele_cust_df = pd.read_csv(f'{directory}/Data/telecom_customer_churn.csv') 
In [4]:
#Setting Max no. of columns to view the complete data
pd.set_option('display.max_columns', 38)
In [5]:
#Head part of the Customer Data
tele_cust_df.head()
Out[5]:
Customer ID Gender Age Married Number of Dependents City Zip Code Latitude Longitude Number of Referrals Tenure in Months Offer Phone Service Avg Monthly Long Distance Charges Multiple Lines Internet Service Internet Type Avg Monthly GB Download Online Security Online Backup Device Protection Plan Premium Tech Support Streaming TV Streaming Movies Streaming Music Unlimited Data Contract Paperless Billing Payment Method Monthly Charge Total Charges Total Refunds Total Extra Data Charges Total Long Distance Charges Total Revenue Customer Status Churn Category Churn Reason
0 0002-ORFBO Female 37 Yes 0 Frazier Park 93225 34.827662 -118.999073 2 9 None Yes 42.39 No Yes Cable 16.0 No Yes No Yes Yes No No Yes One Year Yes Credit Card 65.6 593.30 0.00 0 381.51 974.81 Stayed NaN NaN
1 0003-MKNFE Male 46 No 0 Glendale 91206 34.162515 -118.203869 0 9 None Yes 10.69 Yes Yes Cable 10.0 No No No No No Yes Yes No Month-to-Month No Credit Card -4.0 542.40 38.33 10 96.21 610.28 Stayed NaN NaN
2 0004-TLHLJ Male 50 No 0 Costa Mesa 92627 33.645672 -117.922613 0 4 Offer E Yes 33.65 No Yes Fiber Optic 30.0 No No Yes No No No No Yes Month-to-Month Yes Bank Withdrawal 73.9 280.85 0.00 0 134.60 415.45 Churned Competitor Competitor had better devices
3 0011-IGKFF Male 78 Yes 0 Martinez 94553 38.014457 -122.115432 1 13 Offer D Yes 27.82 No Yes Fiber Optic 4.0 No Yes Yes No Yes Yes No Yes Month-to-Month Yes Bank Withdrawal 98.0 1237.85 0.00 0 361.66 1599.51 Churned Dissatisfaction Product dissatisfaction
4 0013-EXCHZ Female 75 Yes 0 Camarillo 93010 34.227846 -119.079903 3 3 None Yes 7.38 No Yes Fiber Optic 11.0 No No No Yes Yes No No Yes Month-to-Month Yes Credit Card 83.9 267.40 0.00 0 22.14 289.54 Churned Dissatisfaction Network reliability
In [6]:
# Info of our columns
tele_cust_df.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 38 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   Customer ID                        7043 non-null   object 
 1   Gender                             7043 non-null   object 
 2   Age                                7043 non-null   int64  
 3   Married                            7043 non-null   object 
 4   Number of Dependents               7043 non-null   int64  
 5   City                               7043 non-null   object 
 6   Zip Code                           7043 non-null   int64  
 7   Latitude                           7043 non-null   float64
 8   Longitude                          7043 non-null   float64
 9   Number of Referrals                7043 non-null   int64  
 10  Tenure in Months                   7043 non-null   int64  
 11  Offer                              7043 non-null   object 
 12  Phone Service                      7043 non-null   object 
 13  Avg Monthly Long Distance Charges  6361 non-null   float64
 14  Multiple Lines                     6361 non-null   object 
 15  Internet Service                   7043 non-null   object 
 16  Internet Type                      5517 non-null   object 
 17  Avg Monthly GB Download            5517 non-null   float64
 18  Online Security                    5517 non-null   object 
 19  Online Backup                      5517 non-null   object 
 20  Device Protection Plan             5517 non-null   object 
 21  Premium Tech Support               5517 non-null   object 
 22  Streaming TV                       5517 non-null   object 
 23  Streaming Movies                   5517 non-null   object 
 24  Streaming Music                    5517 non-null   object 
 25  Unlimited Data                     5517 non-null   object 
 26  Contract                           7043 non-null   object 
 27  Paperless Billing                  7043 non-null   object 
 28  Payment Method                     7043 non-null   object 
 29  Monthly Charge                     7043 non-null   float64
 30  Total Charges                      7043 non-null   float64
 31  Total Refunds                      7043 non-null   float64
 32  Total Extra Data Charges           7043 non-null   int64  
 33  Total Long Distance Charges        7043 non-null   float64
 34  Total Revenue                      7043 non-null   float64
 35  Customer Status                    7043 non-null   object 
 36  Churn Category                     1869 non-null   object 
 37  Churn Reason                       1869 non-null   object 
dtypes: float64(9), int64(6), object(23)
memory usage: 9.7 MB
In [7]:
#Checking null values in all the columns.
tele_cust_df.isna().sum()
Out[7]:
Customer ID                             0
Gender                                  0
Age                                     0
Married                                 0
Number of Dependents                    0
City                                    0
Zip Code                                0
Latitude                                0
Longitude                               0
Number of Referrals                     0
Tenure in Months                        0
Offer                                   0
Phone Service                           0
Avg Monthly Long Distance Charges     682
Multiple Lines                        682
Internet Service                        0
Internet Type                        1526
Avg Monthly GB Download              1526
Online Security                      1526
Online Backup                        1526
Device Protection Plan               1526
Premium Tech Support                 1526
Streaming TV                         1526
Streaming Movies                     1526
Streaming Music                      1526
Unlimited Data                       1526
Contract                                0
Paperless Billing                       0
Payment Method                          0
Monthly Charge                          0
Total Charges                           0
Total Refunds                           0
Total Extra Data Charges                0
Total Long Distance Charges             0
Total Revenue                           0
Customer Status                         0
Churn Category                       5174
Churn Reason                         5174
dtype: int64
In [8]:
# Checking whether the NaN values are 'No' or 'None' as per the Data Dictionary.
tele_cust_df.loc[tele_cust_df['Phone Service'] == 'No', ['Avg Monthly Long Distance Charges', 'Multiple Lines']]
Out[8]:
Avg Monthly Long Distance Charges Multiple Lines
10 NaN NaN
14 NaN NaN
16 NaN NaN
19 NaN NaN
25 NaN NaN
... ... ...
6979 NaN NaN
6980 NaN NaN
6996 NaN NaN
7016 NaN NaN
7042 NaN NaN

682 rows × 2 columns

In [9]:
# Checking whether the NaN values are 'No' or 'None' as per the Data Dictionary.
tele_cust_df.loc[tele_cust_df['Internet Service'] == 'No', tele_cust_df.columns[16:26]]
Out[9]:
Internet Type Avg Monthly GB Download Online Security Online Backup Device Protection Plan Premium Tech Support Streaming TV Streaming Movies Streaming Music Unlimited Data
20 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
27 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
28 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ...
7026 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7028 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7032 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7033 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7037 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

1526 rows × 10 columns

In [10]:
# custmer segregation
tele_cust_df['Customer Status'].value_counts()
Out[10]:
Stayed     4720
Churned    1869
Joined      454
Name: Customer Status, dtype: int64

After going through all the null values and the data dictionary, I have understood the reason behind the null values.

All the missing or null values need to be replaced with 0, No and None.

In [11]:
#Replacing NaN values with appropriate values
tele_cust_df.fillna({"Avg Monthly Long Distance Charges": 0, 
                     "Multiple Lines": 'No', 
                     "Internet Type": 'None', 
                     "Avg Monthly GB Download": 0, 
                     "Online Security": 'No', 
                     "Online Backup": 'No', 
                     "Device Protection Plan": 'No', 
                     "Premium Tech Support": 'No', 
                     "Streaming TV": 'No', 
                     "Streaming Movies": 'No', 
                     "Streaming Music": 'No', 
                     "Unlimited Data": 'No', 
                     "Churn Category": 'None', 
                     "Churn Reason": 'None'}, inplace = True)
In [12]:
# There should be 0 Nan values
tele_cust_df.isna().sum()
Out[12]:
Customer ID                          0
Gender                               0
Age                                  0
Married                              0
Number of Dependents                 0
City                                 0
Zip Code                             0
Latitude                             0
Longitude                            0
Number of Referrals                  0
Tenure in Months                     0
Offer                                0
Phone Service                        0
Avg Monthly Long Distance Charges    0
Multiple Lines                       0
Internet Service                     0
Internet Type                        0
Avg Monthly GB Download              0
Online Security                      0
Online Backup                        0
Device Protection Plan               0
Premium Tech Support                 0
Streaming TV                         0
Streaming Movies                     0
Streaming Music                      0
Unlimited Data                       0
Contract                             0
Paperless Billing                    0
Payment Method                       0
Monthly Charge                       0
Total Charges                        0
Total Refunds                        0
Total Extra Data Charges             0
Total Long Distance Charges          0
Total Revenue                        0
Customer Status                      0
Churn Category                       0
Churn Reason                         0
dtype: int64

All the missing values are replaced and the data is now complete

In [13]:
#checking for any duplicate values
tele_cust_df.duplicated().sum()
Out[13]:
0

0 indicates no duplicate value

In [14]:
#changing the data types of the columns
col_list = tele_cust_df.columns
col_list = col_list[[1, 3, 11, 12, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 35, 36, 37]].to_list()
tele_cust_df[col_list] = tele_cust_df[col_list].astype('category')
tele_cust_df.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 38 columns):
 #   Column                             Non-Null Count  Dtype   
---  ------                             --------------  -----   
 0   Customer ID                        7043 non-null   object  
 1   Gender                             7043 non-null   category
 2   Age                                7043 non-null   int64   
 3   Married                            7043 non-null   category
 4   Number of Dependents               7043 non-null   int64   
 5   City                               7043 non-null   object  
 6   Zip Code                           7043 non-null   int64   
 7   Latitude                           7043 non-null   float64 
 8   Longitude                          7043 non-null   float64 
 9   Number of Referrals                7043 non-null   int64   
 10  Tenure in Months                   7043 non-null   int64   
 11  Offer                              7043 non-null   category
 12  Phone Service                      7043 non-null   category
 13  Avg Monthly Long Distance Charges  7043 non-null   float64 
 14  Multiple Lines                     7043 non-null   category
 15  Internet Service                   7043 non-null   category
 16  Internet Type                      7043 non-null   category
 17  Avg Monthly GB Download            7043 non-null   float64 
 18  Online Security                    7043 non-null   category
 19  Online Backup                      7043 non-null   category
 20  Device Protection Plan             7043 non-null   category
 21  Premium Tech Support               7043 non-null   category
 22  Streaming TV                       7043 non-null   category
 23  Streaming Movies                   7043 non-null   category
 24  Streaming Music                    7043 non-null   category
 25  Unlimited Data                     7043 non-null   category
 26  Contract                           7043 non-null   category
 27  Paperless Billing                  7043 non-null   category
 28  Payment Method                     7043 non-null   category
 29  Monthly Charge                     7043 non-null   float64 
 30  Total Charges                      7043 non-null   float64 
 31  Total Refunds                      7043 non-null   float64 
 32  Total Extra Data Charges           7043 non-null   int64   
 33  Total Long Distance Charges        7043 non-null   float64 
 34  Total Revenue                      7043 non-null   float64 
 35  Customer Status                    7043 non-null   category
 36  Churn Category                     7043 non-null   category
 37  Churn Reason                       7043 non-null   category
dtypes: category(21), float64(9), int64(6), object(2)
memory usage: 1.8 MB

After changing the datatypes to category the memory usage is dropped from ~10 MiB to ~1 MiB

With this Importing and wrangling is done.


STAGE-B¶

Exploratory Data Analysis(EDA) and Visualization¶

Understaning, the objective and performing analysis and visualization

  • Identifying high value customers
  • Churn risks
In [15]:
#descriptive statistics of the dataframe
tele_cust_df.describe()
Out[15]:
Age Number of Dependents Zip Code Latitude Longitude Number of Referrals Tenure in Months Avg Monthly Long Distance Charges Avg Monthly GB Download Monthly Charge Total Charges Total Refunds Total Extra Data Charges Total Long Distance Charges Total Revenue
count 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000 7043.000000
mean 46.509726 0.468692 93486.070567 36.197455 -119.756684 1.951867 32.386767 22.958954 20.515405 63.596131 2280.381264 1.962182 6.860713 749.099262 3034.379056
std 16.750352 0.962802 1856.767505 2.468929 2.154425 3.001199 24.542061 15.448113 20.418940 31.204743 2266.220462 7.902614 25.104978 846.660055 2865.204542
min 19.000000 0.000000 90001.000000 32.555828 -124.301372 0.000000 1.000000 0.000000 0.000000 -10.000000 18.800000 0.000000 0.000000 0.000000 21.360000
25% 32.000000 0.000000 92101.000000 33.990646 -121.788090 0.000000 9.000000 9.210000 3.000000 30.400000 400.150000 0.000000 0.000000 70.545000 605.610000
50% 46.000000 0.000000 93518.000000 36.205465 -119.595293 0.000000 29.000000 22.890000 17.000000 70.050000 1394.550000 0.000000 0.000000 401.440000 2108.640000
75% 60.000000 0.000000 95329.000000 38.161321 -117.969795 3.000000 55.000000 36.395000 27.000000 89.750000 3786.600000 0.000000 0.000000 1191.100000 4801.145000
max 80.000000 9.000000 96150.000000 41.962127 -114.192901 11.000000 72.000000 49.990000 85.000000 118.750000 8684.800000 49.790000 150.000000 3564.720000 11979.340000
In [16]:
#Radio Button Widget to control the plots
select_widget = pn.widgets.Select(name = 'Select to apply', 
                                  options = ['Total Revenue(10e3)', 
                                             'Total Refunds', 
                                             'Total Customer'],
                                  value = 'Total Revenue(10e3)')
select_widget
Out[16]:
In [17]:
#Checking whether the widget is working or not.
def display_1(a):
    return f'Radio Button Value: {a}'

pn.Column(
    select_widget,
    pn.bind(display_1, a = select_widget)
)
Out[17]:

1. Top Cities by Total Customer, Revenue and Refund.¶

In [18]:
#Grouping the data on city and renaming the columns.
cust_rev_df = tele_cust_df.groupby('City', as_index = False)\
.agg({'Latitude': 'mean', 
      'Longitude': 'mean',
      'Customer ID': 'count', 
      'Total Revenue': 'sum', 
      'Total Refunds': 'sum'})
cust_rev_df['Total Revenue'] = (cust_rev_df['Total Revenue'] / 1000)
cust_rev_df.rename(columns = {'Customer ID': 'Total Customer', 'Total Revenue': 'Total Revenue(10e3)'}, 
                   inplace = True)
cust_rev_df
Out[18]:
City Latitude Longitude Total Customer Total Revenue(10e3) Total Refunds
0 Acampo 38.200231 -121.235034 4 18.10796 1.27
1 Acton 34.501452 -118.207862 4 12.15636 0.00
2 Adelanto 34.667815 -117.536183 5 18.23549 48.23
3 Adin 41.171578 -120.913161 4 5.53938 0.00
4 Agoura Hills 34.129058 -118.759788 5 10.64188 0.00
... ... ... ... ... ... ...
1101 Yreka 41.764869 -122.671316 4 17.46767 0.00
1102 Yuba City 39.051552 -121.660521 8 17.86782 123.31
1103 Yucaipa 34.045970 -117.011825 4 29.76580 0.00
1104 Yucca Valley 34.159534 -116.425984 5 12.37469 40.41
1105 Zenia 40.170357 -123.417298 4 12.73338 0.00

1106 rows × 6 columns

In [19]:
#creating geo plot and table and this can be controlled by the widget
def cust_rev_geoplot(select_widget):
    return cust_rev_df.sort_values(select_widget, ascending = True)\
           .hvplot.points('Longitude', 
                          'Latitude', 
                          xaxis = None, 
                          yaxis = None, 
                          geo=True, 
                          tiles = 'OSM', 
                          frame_width = 550, 
                          color = select_widget, 
                          cmap = 'plasma', 
                          size = select_widget, 
                          hover_cols = ['City'], 
                          clabel = select_widget, 
                          title = f'{select_widget} as per Cities')

cust_rev_bind = pn.bind(cust_rev_geoplot, select_widget)

pn.Column(select_widget, cust_rev_bind)
Out[19]:

Note: Cities that have the highest revenue and the most customer are Los Angeles, San Diego, Sacramento, San Jose and San Francisco

2. Customer Status and Contract as per Tenure¶

In [20]:
# Creating a new column 'Tenure' for a ordinal category.
tele_cust_df['Tenure'] = pd.cut(tele_cust_df['Tenure in Months'], 
                               bins = [0, 12, 24, 36, 48, 60, 72],
                               labels = ['Less than 1 Year', '1 - 2 Years', '2 - 3 Years', '3 - 4 Years', 
                                         '4 - 5 Years', '5 - 6 Years'])
tele_cust_df['Tenure'].unique()
Out[20]:
['Less than 1 Year', '1 - 2 Years', '5 - 6 Years', '4 - 5 Years', '2 - 3 Years', '3 - 4 Years']
Categories (6, object): ['Less than 1 Year' < '1 - 2 Years' < '2 - 3 Years' < '3 - 4 Years' < '4 - 5 Years' < '5 - 6 Years']
In [21]:
#Grouping on Tenure and Customer Status and renaming the columns
rev_refund_df_1 = tele_cust_df.groupby(['Tenure', 'Customer Status'])\
.agg({'Customer ID': 'count', 
      'Total Revenue': 'sum', 
      'Total Refunds': 'sum'}).reset_index()
rev_refund_df_1['Total Revenue'] = rev_refund_df_1['Total Revenue'] / 1000
rev_refund_df_1.rename(columns = {'Customer ID': 'Total Customer', 'Total Revenue': 'Total Revenue(10e3)'}, 
                       inplace = True)
rev_refund_df_1
Out[21]:
Tenure Customer Status Total Customer Total Revenue(10e3) Total Refunds
0 Less than 1 Year Churned 1037 397.41770 1033.93
1 Less than 1 Year Joined 454 54.27975 108.35
2 Less than 1 Year Stayed 695 396.26792 990.50
3 1 - 2 Years Churned 294 524.74053 708.75
4 1 - 2 Years Joined 0 0.00000 0.00
5 1 - 2 Years Stayed 730 1072.81979 1712.10
6 2 - 3 Years Churned 180 585.68371 432.36
7 2 - 3 Years Joined 0 0.00000 0.00
8 2 - 3 Years Stayed 652 1645.92039 1667.24
9 3 - 4 Years Churned 145 659.78713 332.90
10 3 - 4 Years Joined 0 0.00000 0.00
11 3 - 4 Years Stayed 617 2218.43720 1762.97
12 4 - 5 Years Churned 120 742.74710 217.84
13 4 - 5 Years Joined 0 0.00000 0.00
14 4 - 5 Years Stayed 712 3569.17219 1697.60
15 5 - 6 Years Churned 93 774.08365 113.20
16 5 - 6 Years Joined 0 0.00000 0.00
17 5 - 6 Years Stayed 1314 8729.77463 3041.91
In [22]:
#Grouping on Tenure and Contract and renaming the columns
rev_refund_df_2 = tele_cust_df.groupby(['Tenure', 'Contract'])\
.agg({'Customer ID': 'count', 
      'Total Revenue': 'sum', 
      'Total Refunds': 'sum'}).reset_index()
rev_refund_df_2['Total Revenue'] = rev_refund_df_2['Total Revenue'] / 1000
rev_refund_df_2.rename(columns = {'Customer ID': 'Total Customer', 'Total Revenue': 'Total Revenue(10e3)'}, 
                       inplace = True)
rev_refund_df_2
Out[22]:
Tenure Contract Total Customer Total Revenue(10e3) Total Refunds
0 Less than 1 Year Month-to-Month 1913 719.67822 1695.31
1 Less than 1 Year One Year 146 69.72779 271.63
2 Less than 1 Year Two Year 127 58.55936 165.84
3 1 - 2 Years Month-to-Month 686 1141.09855 1561.33
4 1 - 2 Years One Year 215 301.83451 539.23
5 1 - 2 Years Two Year 123 154.62726 320.29
6 2 - 3 Years Month-to-Month 433 1261.29955 1202.25
7 2 - 3 Years One Year 268 676.77707 357.64
8 2 - 3 Years Two Year 131 293.52748 539.71
9 3 - 4 Years Month-to-Month 282 1184.82725 706.99
10 3 - 4 Years One Year 278 1055.14521 738.98
11 3 - 4 Years Two Year 202 638.25187 649.90
12 4 - 5 Years Month-to-Month 209 1191.69267 474.09
13 4 - 5 Years One Year 326 1787.33850 703.57
14 4 - 5 Years Two Year 297 1332.88812 737.78
15 5 - 6 Years Month-to-Month 87 663.89198 121.62
16 5 - 6 Years One Year 317 2280.97123 679.28
17 5 - 6 Years Two Year 1003 6558.99507 2354.21
In [23]:
#Ploting two horizontal bar, one for customer status and the other for contract.
#Both can be controlled by the Radio Button widget
def rev_refund_plot_1(select_widget):
    return rev_refund_df_1.hvplot.barh(x = 'Tenure', 
                                       y = select_widget, 
                                       by = 'Customer Status', 
                                       legend = 'top', 
                                       cmap = 'Dark2', 
                                       stacked = True, 
                                       height = 400,
                                       title = f'{select_widget} vs. Tenure as per the Customer Status')

def rev_refund_plot_2(select_widget):
    return rev_refund_df_2.hvplot.barh(x = 'Tenure', 
                                       y = select_widget, 
                                       by = 'Contract', 
                                       legend = 'top', 
                                       cmap = 'Set1', 
                                       stacked = True, 
                                       height = 400,
                                       title = f'{select_widget} vs. Tenure as per the Contract')

rev_refund_bind_1 = pn.bind(rev_refund_plot_1, select_widget)
rev_refund_bind_2 = pn.bind(rev_refund_plot_2, select_widget)

pn.Column(select_widget, pn.Row(rev_refund_bind_1), pn.Row(rev_refund_bind_2))
Out[23]:

Note:

  • Most of the customers churned(1037) with in the first year
  • All "stayed" customer(4720) are high value customer, but the customers with tenure of 5-6 Years(1314) are the most loyal customers.

3. Chrun Risk¶

In [24]:
# Grouping on churn category and churn reason(similar process as above) 
churn_count_df = tele_cust_df.groupby(['Churn Category', 'Churn Reason'])\
.agg({'Customer ID': 'count', 
      'Total Revenue': 'sum', 
      'Total Refunds': 'sum'}).reset_index()
churn_count_df['Total Revenue'] = churn_count_df['Total Revenue'] / 1000
churn_count_df.rename(columns = {'Customer ID': 'Total Customer', 'Total Revenue': 'Total Revenue(10e3)'}, 
                      inplace = True)
churn_count_df
Out[24]:
Churn Category Churn Reason Total Customer Total Revenue(10e3) Total Refunds
0 Attitude Attitude of service provider 94 174.85874 117.54
1 Attitude Attitude of support person 220 404.69539 202.04
2 Attitude Competitor had better devices 0 0.00000 0.00
3 Attitude Competitor made better offer 0 0.00000 0.00
4 Attitude Competitor offered higher download speeds 0 0.00000 0.00
... ... ... ... ... ...
121 Price Poor expertise of online support 0 0.00000 0.00
122 Price Poor expertise of phone support 0 0.00000 0.00
123 Price Price too high 78 170.17043 183.37
124 Price Product dissatisfaction 0 0.00000 0.00
125 Price Service dissatisfaction 0 0.00000 0.00

126 rows × 5 columns

In [25]:
# Droping out all the zero values and also the 'None' category and reason values
churn_count_df = churn_count_df.loc[(churn_count_df['Total Customer'] != 0) | 
                                    (churn_count_df['Total Revenue(10e3)'] != 0) | 
                                    (churn_count_df['Total Refunds'] != 0)].drop(78).reset_index(drop = True)
churn_count_df
Out[25]:
Churn Category Churn Reason Total Customer Total Revenue(10e3) Total Refunds
0 Attitude Attitude of service provider 94 174.85874 117.54
1 Attitude Attitude of support person 220 404.69539 202.04
2 Competitor Competitor had better devices 313 579.87705 485.84
3 Competitor Competitor made better offer 311 582.82220 396.38
4 Competitor Competitor offered higher download speeds 100 234.14569 227.94
5 Competitor Competitor offered more data 117 297.56821 301.39
6 Dissatisfaction Lack of self-service on Website 29 36.78815 35.34
7 Dissatisfaction Limited range of services 37 47.25443 18.39
8 Dissatisfaction Network reliability 72 132.44367 176.75
9 Dissatisfaction Poor expertise of online support 31 56.61714 93.98
10 Dissatisfaction Poor expertise of phone support 12 17.08599 85.68
11 Dissatisfaction Product dissatisfaction 77 204.87574 100.21
12 Dissatisfaction Service dissatisfaction 63 122.91414 131.19
13 Other Deceased 6 4.17283 0.00
14 Other Don't know 130 256.40335 146.13
15 Other Moved 46 93.81296 0.00
16 Price Extra data charges 39 104.15272 90.71
17 Price Lack of affordable download/upload speed 30 66.77276 11.05
18 Price Long distance charges 64 97.02823 35.05
19 Price Price too high 78 170.17043 183.37
In [26]:
# Plotting a horizontal bar that shows widgets vs churn reason
def churn_count_plot(select_widget):
    return churn_count_df.sort_values(select_widget)\
           .hvplot.barh(x = 'Churn Reason', 
                        y = select_widget, 
                        by = 'Churn Category', 
                        cmap = 'Dark2', 
                        stacked = True, 
                        title = f'{select_widget} vs. Churn Reason as per the Churn Category', 
                        height = 500, 
                        legend = 'bottom_right', 
                        ylim = (0, 600))

churn_count_bind = pn.bind(churn_count_plot, select_widget)

pn.Column(select_widget, churn_count_bind)
Out[26]:

Dashboard¶

Panel is capable to generate various types of dashboards using notebook. There are multiple different templates which are ready to use with less than 10 lines of codes but running the template will need a server, which will take us to a browser that breaks the workflow of our notebook. Template is good for the final phase of the development, but for now we will embed it in this notebook to get the feel of a dashboard.

In [27]:
# Dashboard title
dashboard_title = '# Maven Communications - Customer Churn Dashboard'
In [28]:
# Dashboard Description
dashboard_desc = """This dashboard displays the case study done on Maven Communications customer dataset, 
to understand the fundamental behavioral pattern of all the customers and help the Marketing team to make
a strategy on "Customer Retention"."""
In [29]:
# Compiling the dashboard with all the above plots and widget to control the plots
dashboard = pn.Column(dashboard_title,
                      dashboard_desc,
                      select_widget,
                      cust_rev_bind,
                      rev_refund_bind_1,
                      rev_refund_bind_2,
                      churn_count_bind
                      )
In [31]:
# Embedding and displaying the dashboard
dashboard.embed();